Search CORE

24 research outputs found

A multi-layered Bayesian network model for structured document retrieval

Author: F. Crestani
G. Bordogna
G. Salton
H.R. Turtle
J. Vegas
L.M. Campos de
M. Lalmas
S. Acid
T. Roelleke
Y. Chiaramella
Publication venue
Publication date: 01/01/2003
Field of study

New standards in document representation, like for example SGML, XML, and MPEG-7, compel Information Retrieval to design and implement models and tools to index, retrieve and present documents according to the given document structure. The paper presents the design of an Information Retrieval system for multimedia structured documents, like for example journal articles, e-books, and MPEG-7 videos. The system is based on Bayesian Networks, since this class of mathematical models enable to represent and quantify the relations between the structural components of the document. Some preliminary results on the system implementation are also presented

Crossref

University of Strathclyde Institutional Repository

DB&IR Integration: Report on the Dagstuhl Seminar ''Ranked XML Querying''

Author: Amer-Yahia S.
Hiemstra Djoerd
Roelleke T.
Srivastava D.
Weikum G.
Publication venue: Dagstuhl
Publication date: 01/01/2008
Field of study

University of Twente Research Information

Ranking structured documents using utility theory in the Bayesian network retrieval model

Author: F. Crestani
G. Bordogna
G. Kazai
L.M. Campos de
M. Lalmas
R. Baeza-Yates
R.D. Shachter
S. Acid
S. French
T. Roelleke
Y. Chiaramella
Publication venue
Publication date: 01/01/2003
Field of study

In this paper a new method based on Utility and Decision theory is presented to deal with structured documents. The aim of the application of these methodologies is to refine a first ranking of structural units, generated by means of an Information Retrieval Model based on Bayesian Networks. Units are newly arranged in the new ranking by combining their posterior probabilities, obtained in the first stage, with the expected utility of retrieving them. The experimental work has been developed using the Shakespeare structured collection and the results show an improvement of the effectiveness of this new approach

CiteSeerX

Crossref

University of Strathclyde Institutional Repository

A systematic approach to normalization in probabilistic models

Author: Aldo Lipani
Allan Hanbury
Ben He
G Amati
Gerard Salton
K. Church
Mihai Lupu
S Robertson
T Roelleke
Thomas Roelleke
Thomas Roelleke
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/12/2018
Field of study

Open access funding provided by Austrian Science Fund (FWF). This research was partly supported by the Austrian Science Fund (FWF) Project Number P25905-N23 (ADmIRE). This work has been supported by the Self-Optimizer project (FFG 852624) in the EUROSTARS programme, funded by EUREKA, the BMWFW and the European Union

Crossref

UCL Discovery

Queen Mary Research Online

Towards a Better Understanding of the Relationship between Probabilistic Models in IR

Author: C. Zhai
C. Zhai
C. Zhai
C.D. Manning
D.W. Hosmer
F. Crestani
J. Lafferty
J.M. Ponte
K. Spärck-Jones
N. Fuhr
R.W.P. Luk
S.E. Robertson
S.E. Robertson
S.E. Robertson
S.E. Robertson
T. Roelleke
T. Roelleke
V. Lavrenko
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Probability of relevance (PR) models are generally assumed to implement the Probability Ranking Principle (PRP) of IR, and recent publications claim that PR models and language models are similar. However, a careful analysis reveals two gaps in the chain of reasoning behind this statement. First, the PRP considers the relevance of particular documents, whereas PR models consider the relevance of any query-document pair. Second, unlike PR models, language models consider draws of terms and documents. We bridge the first gap by showing how the probability measure of PR models can be used to define the probabilistic model of the PRP. Furthermore, we argue that given the differences between PR models and language models, the second gap cannot be bridged at the probabilistic model level. We instead define a new PR model based on logistic regression, which has a similar score function to the one of the query likelihood model. The performance of both models is strongly correlated, hence providing a bridge for the second gap at the functional and ranking level. Understanding language models in relation with logistic regression models opens ample new research directions which we propose as future work

Crossref

Ghent University Academic Bibliography

Flexible and efficient IR using array databases

Author: A. Eisenberg
A. Trotman
Arjen P. de Vries
C. Galindo-Legaria
D.A. Grossman
G. Graefe
G.H. Golub
H. Turtle
I.H. Witten
L.A. Barroso
M.F. Porter
Marcin Zukowski
P. Buneman
Peter Boncz
Roberto Cornacchia
S.E. Robertson
Sándor Héman
T. Grabs
T. Roelleke
U.S. Chakravarthy
V.N. Anh
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Opinion-aware retrieval models based on sentiment and intensity of lexical features

Author: Bahrani M
Roelleke T
Publication venue: 'IOS Press'
Publication date: 29/10/2021
Field of study

Sentiment analysis has received much attention in Information Retrieval (IR) and other domains including data mining, machine learning algorithms and NLP. However, when it comes to big data, incorporating sentiment of words into IR models becomes even more important, and as yet no widely accepted standard exists for this task. The contribution of this paper is a framework for quantifying term frequency (TF) variants with sentiments. We propose models derived from the strength of lexical features to improve sentiment-based ranking

Queen Mary Research Online

Explicitly considering relevance within the language modeling framework

Author: Azzopardi L.
Roelleke T.
Publication venue
Publication date: 01/01/2007
Field of study

Whilst the event of relevance is central to the Binary Independence Retrieval model, Language Modeling focuses on the estimation of the document model. In this paper, we review the different past formulations of the Language Modeling (query likelihood) approach. We find that these previous formulations largely ignore relevance by making implicit or explicit assumptions. The main contribution of this work is an alternative formulation that specifically relates relevance and language modeling in a sound probabilistic framework. This leads to valuable insights into the application of Language Modeling to Information Retrieval, including how the approach handles relevance information and how the approach can be further developed

Enlighten

ADOR: A New Medical Dataset for Sentiment-based IR

Author: Bahrani M
Roelleke T
Publication venue
Publication date: 01/01/2021
Field of study

Sentiment analysis has received attention in retrieval applications. Combining opinions such as user feelings with semantics would enhance the performance of these applications, especially when the level of urgency is essential, e.g., medical domain. However, no widely medical benchmark is known for evaluating sentiment-aware IR. In this paper, we create a dataset based on Amazon reviews for medical products and make it publicly available. To assess the compatibility of the benchmark with opinions and concepts we propose a sentiment-aware extension of TF.IDF and apply it to the dataset. This model is derived from linear combinations of sentiment-based TF.IDF score with term-based and conceptual TF.IDF scores. The benchmark could help healthcare organizations to effectively detect, rank and filter the most urgent notifications based on patient's health status, narratives and conditions

Queen Mary Research Online

A Descriptive Approach to Classification

Author: A. Hunter
C. Cumbo
F. Sebastiani
H. Nottelmann
T. Roelleke
T. Roelleke
T. Rolleke
Publication venue
Publication date: 01/01/2011
Field of study

Abstract. Nowadays information systems are required to be more adaptable and flexible than before to deal with the rapidly increasing quantity of available data and changing information needs. Text Classification (TC) is a useful task that can help to solve different problems in different fields. This paper investigates the application of descriptive approaches for modelling classification. The main objectives are increasing abstraction and flexibility so that expert users are able to customise specific strategies for their needs. The contribution of this paper is two-fold. Firstly, it illustrates that the modelling of classifiers in a descriptive approach is possible and it leads to a close definition w.r.t. mathematical formulations. Moreover, the automatic translation from PDatalog to mathematical formulation is discussed. Secondly, quality and efficiency results prove the approach feasibility for real-scale collections.

CiteSeerX

Crossref